Discrete representation learning for handwritten text recognition

نویسندگان

چکیده

Abstract Handwritten text recognition, i.e., the conversion of scanned handwritten documents into machine-readable text, is a complex exercise due to variability and complexity handwriting. A common approach in recognition consists feature extraction step followed by recognizer. In this paper, we propose novel DNN architecture for that extracts discrete representation from input text-line image. The proposed model constructed an encoder–decoder network with added quantization layer which applies dictionary representative vectors discretize latent variables. parameters are trained jointly through k -means algorithm back propagation, respectively. performance suggested evaluated conducting extensive experiments on five datasets, analyzing effect handwriting recognition. results demonstrate use discretization improves deep models when compared conventional continuous representation. Specifically, character error rate decreased $$22\%$$ 22 % $$21.1\%$$ 21.1 IAM ICFHR18

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning for Historic Handwritten Text Recognition

This thesis examines the use of active learning for the task of handwritten text recognition in historical documents. Active learning is a machine learning paradigm which enables the learner to select the data that is being trained on. In domains where procuring annotated data is expensive but there are large amounts of unlabelled data available, active learning can lead to better models with t...

متن کامل

Sentence Boundary Detection for Handwritten Text Recognition

In the larger context of handwritten text recognition systems many natural language processing techniques can potentially be applied to the output of such systems. However, these techniques often assume that the input is segmented into meaningful units, such as sentences. This paper investigates the use of hidden-event language models and a maximum entropy based method for sentence boundary det...

متن کامل

Self-training for Handwritten Text Line Recognition

Off-line handwriting recognition deals with the task of automatically recognizing handwritten text from images, for example from scanned sheets of paper. Due to the tremendous variations of writing styles encountered between different individuals, this is a very challenging task. Traditionally, a recognition system is trained by using a large corpus of handwritten text that has to be transcribe...

متن کامل

Handwritten Text Recognition for Ancient Documents

Huge amounts of legacy documents are being published by on-line digital libraries world wide. However, for these raw digital images to be really useful, they need to be transcribed into a textual electronic format that would allow unrestricted indexing, browsing and querying. In some cases, adequate transcriptions of the handwritten text images are already available. In this work three systems ...

متن کامل

Handwritten Text Recognition for Historical Documents

The amount of digitized legacy documents has been rising dramatically over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed into a textual electronic format (such as ASCII or PDF) that would provide historians and other researchers new ways of indexing, consulting and que...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Computing and Applications

سال: 2023

ISSN: ['0941-0643', '1433-3058']

DOI: https://doi.org/10.1007/s00521-023-08445-9